[FX.php List] Security Concerns with FileMaker Website
Joel Shapiro
jsfmp at earthlink.net
Thu Jan 25 12:20:38 MST 2007
Thanks for your explanation, Ed. And your efforts are certainly
commendable (I hope your higher-ups are aware!)
Just to clarify, then, if a site doesn't have confidential info or
email addresses & phone numbers, etc., there's no huge concern over
these bots. Anybody disagree?
Thanks,
-Joel
On Jan 25, 2007, at 11:04 AM, Edward L. Ford wrote:
> In the case of the website I started this thread about, it's hosted
> by a major academic institution (see my email address for a clue),
> so bandwidth isn't really a problem -- in August 2006, the number
> of pages on our servers was between 400k and 500k (not an
> exaggeration!) -- I don't know the stats, but I can only imagine
> our monthly data transfer is on the order of tens of terabytes, if
> not more.
>
> My main concern is about information confidentiality. The search
> engines can be told where and where not to go, but the nasty bots
> aren't so kind. Since the information provided on this site is
> valid contact info, it's sellable info that can be easily picked up
> automated bots.
>
> The purpose of my site is essentially to be a "Musician White
> Pages", so that these people may be contacted by anyone within or
> outside the University for music jobs. Within this website,
> contact info such as phone numbers, email addresses, personal
> websites, and bios are provided, based on how much info people
> provide when they sign up to be listed.
>
> Now, when someone signs up for the service, they must agree to our
> privacy policy, which includes a statement:
> "By signing up for this service, you agree to release your contact
> information publicly through this website. Steps have been taken
> to prevent your contact information from showing up in public
> search engines, but the information is still listed on a public
> website, accessible to anyone in the world. By signing up for this
> service, you acknowledge the risks associated with listing contact
> information publicly."
>
> I have a robots.txt file that indicates that information about the
> service should be indexed, but the actual pages with people's info
> are not to be indexed.
>
> How nasty are these bots? I got an eye opener this past week. A
> new entry on these white pages had been public for only 6 hours
> before the email address received a spam message -- and this
> address was brand new, never ever published anywhere before. This
> spam message was sent to about 20 people, the majority of which
> were addresses listed on this site, so I know this email was
> harvested from that site.
>
> A new concern of mine is also regarding phone numbers. TXT spam
> isn't big yet, but let's face it, it's coming -- and so by similar
> harvesting methods, spammers can harvest phone numbers and start
> sending spam TXTs to them -- it costs them nothing, but it costs
> the users, and the cell companies have been slow so far to
> implement controls since it's more money in their pocket.
>
> So while these folks have agreed to the "dangers" of listing info
> publicly online and I could stop there and let things take their
> course, I feel it is my responsibility to do what I can to
> adequately prevent abuse of the information these white pages
> provide for protection of the individuals using our service, while
> still maintaining a user friendly system for those users who have
> legitimate needs for the contact information listed.
>
> I could go on about my position, but I think this is long enough :-)
> --Ed
> ------------------------------------
> http://www.edwardford.net
>
> On Jan 25, 2007, at 1:26 PM, Joel Shapiro wrote:
>
>> Can anyone briefly explain the risks of these bots and how
>> "nasty" (a term used in this thread) they can really be?
>>
>> I've looked at the link Gjermund posted, and it looks like the
>> biggest real problems are email harvesting and use of bandwidth.
>> Ed's original post was because he has many emails and websites on
>> his site, but if there are just one or two contact email addresses
>> on a site, what's the big risk?
>>
>> Thanks for enlightening me.
>>
>> -Joel
>>
>>
>> On Jan 25, 2007, at 10:11 AM, Edward L. Ford wrote:
>>
>>> I didn't think about checking user agents. I just did a cursory
>>> investigation of this, and I may personally implement this
>>> system. For the bots that say they're a real browser, I'll set
>>> up other roadblocks to keep the info on my site (relatively)
>>> protected from harvesting.
>>>
>>> For others interested in checking user agents, I found the
>>> following to investigate:
>>> o) look in the $_SERVER['HTTP_USER_AGENT'] string. More info @
>>> http://us3.php.net/reserved.variables There's a built in
>>> get_browser() function in PHP, which I never knew.
>>>
>>> o) PEAR also has some tools: Check out http://pear.php.net/
>>> package/Net_UserAgent_Detect
>>>
>>> --Ed
>>>
>>> -----------------------------------
>>> http://www.edwardford.net
>>>
>>> On Jan 25, 2007, at 11:41 AM, Gjermund Gusland Thorsen wrote:
>>>
>>>> Alternative 3 works in theory but still leaves some bots crawling,
>>>> as the worst bots tells your webserver that it's the most
>>>> popular browser.
>>>>
>>>> ggt667
>>>>
>>>> On 1/24/07, Jason LEWIS <jasonlewis at weber.edu> wrote:
>>>>> Ed,
>>>>>
>>>>> There are a few things you can do:
>>>>>
>>>>> 1. Sue to get them to stop (I don't like this option because it
>>>>> makes
>>>>> enemies and takes a VERY long time.)
>>>>> 2. use a robots.txt (This option is good for bots that actually
>>>>> honor
>>>>> this. Google honors the robots.txt.)
>>>>> 3. Detect the type of browser before you give them anything and
>>>>> ship
>>>>> out bad information for unwanted browser types. (This is only
>>>>> good if
>>>>> the bot owner does not imitate browser variables.)
>>>>>
>>>>> As for my options, I prefer #3 as I can give the bots something
>>>>> that
>>>>> they are looking for, bad information. In perl, I use
>>>>> $ENV{HTTP_USER_AGENT}, but I am not sure how to call this in php.
>>>>> Anyone else know? A quick search returned nothing on this.
>>>>> Could this
>>>>> involve $HTTP_ENV_VARS?
>>>>>
>>>>> Jason
>>>>>
>>>>> >>> elford at cs.bu.edu 01/23/2007 10:18 PM >>>
>>>>> Hello everyone,
>>>>> In the past hour, I've done some analysis of various logs and
>>>>> emails,
>>>>>
>>>>> and I've come to a chilling realization that I've never had before
>>>>> about bots harvesting information from websites -- I knew it
>>>>> happened, but I never knew the scope of the problem until
>>>>> tonight --
>>>>> and this is a low traffic website!
>>>>>
>>>>> So, I have a website which contains a public listing of email
>>>>> addresses and websites from a FileMaker database. I want to stop
>>>>> unknown bots from crawling the site. All of the data comes out of
>>>>> FileMaker, nicely formatted as links for the end user's clicking
>>>>> convenience. I have a solution to fix email addresses from being
>>>>> harvested, but I was wondering if anyone knows of a way to prevent
>>>>> website addresses from being harvested, but still clickable as a
>>>>> hyperlink.
>>>>>
>>>>> I thought maybe a PHP redirect link, like redirect.php?id=16 where
>>>>> redirect puts a user at the website listed in record 16, but
>>>>> once the
>>>>>
>>>>> PHP is all said and done, we're still at the linked website, so
>>>>> that
>>>>> doesn't really prevent anything from being harvested.
>>>>>
>>>>> Is there a way to maybe detect is a link was actually clicked by a
>>>>> person, and not just passed through by an automated bot? PHP is
>>>>> preferable for such a solution -- JavaScript is too easy to turn
>>>>> off. Or, is there a way to specify that only bots from places
>>>>> like
>>>>> Google, Live, and Yahoo are allowed to crawl the site?
>>>>>
>>>>> Hopefully my predicament is clear. I need to solve this ASAP...
>>>>>
>>>>> --Ed
>>>>> ---------------------
>>>>> http://www.edwardford.net
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> FX.php_List mailing list
>>>>> FX.php_List at mail.iviking.org
>>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>>
>>>> _______________________________________________
>>>> FX.php_List mailing list
>>>> FX.php_List at mail.iviking.org
>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>
>>> _______________________________________________
>>> FX.php_List mailing list
>>> FX.php_List at mail.iviking.org
>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>
>> _______________________________________________
>> FX.php_List mailing list
>> FX.php_List at mail.iviking.org
>> http://www.iviking.org/mailman/listinfo/fx.php_list
>
> _______________________________________________
> FX.php_List mailing list
> FX.php_List at mail.iviking.org
> http://www.iviking.org/mailman/listinfo/fx.php_list
More information about the FX.php_List
mailing list